Lesson 5. Data-driven Mapping

Data-driven mapping refers to the process of using data values to determine the symbology of mapped features. Color, shape, and size are the three most common graphic elements used to symbolize data-driven maps. Data-driven maps are often referred to as thematic maps.


Instructor Notes

Types of Thematic Maps

There are two primary types of thematic maps:

  • Choropleth maps: set the color of areas (polygons) by data value

  • Point symbol maps: set the color or size of points by data value

We review both of these types of maps in more detail in this lesson. First, let’s take a quick look at choropleth maps.

library(sf)
library(tmap)
library(here)

5.1 Choropleth Maps

Choropleth maps are the most common type of thematic map.

Let’s use an sf data.frame of counties data to make a choropleth map.

First, read in the counties data with the st_read function.

counties <- st_read(here("notebook_data",
                         "california_counties",
                         "CaliforniaCounties.shp"))
## Reading layer `CaliforniaCounties' from data source 
##   `C:\Users\kathe\berkeley_drive\analyses\dlab\Geospatial-Fundamentals-in-R-with-sf\notebook_data\california_counties\CaliforniaCounties.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 58 features and 58 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -374445.4 ymin: -604500.7 xmax: 540038.5 ymax: 450022
## Projected CRS: NAD83 / California Albers

Then, make a map of our counties.

plot(counties$geometry)

Now, take a look at the spatial dataframe.

head(counties)
## Simple feature collection with 6 features and 58 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -267387.9 ymin: -578158.6 xmax: 216677.6 ymax: 352693.6
## Projected CRS: NAD83 / California Albers
##   FID_        NAME STATE_NAME POP2010 POP10_SQMI POP2012  POP12_SQMI   WHITE
## 1    0        Kern California  839631      102.9  851089  104.282870  499766
## 2    0       Kings California  152982      109.9  155039  111.427421   83027
## 3    0        Lake California   64665       48.6   65253   49.082334   52033
## 4    0      Lassen California   34895        7.4   35039    7.422856   25532
## 5    0 Los Angeles California 9818605     2402.3 9904341 2423.264150 4936599
## 6    0      Madera California  150865       70.1  153025   71.065672   94456
##    BLACK AMERI_ES   ASIAN HAWN_PI HISPANIC   OTHER MULT_RACE   MALES FEMALES
## 1  48921    12676   34846    1252   413033  204314     37856  433108  406523
## 2  11014     2562    5620     271    77866   42996      7492   86344   66638
## 3   1232     2049     724     108    11088    5455      3064   32469   32196
## 4   2834     1234     356     165     6117    3562      1212   22416   12479
## 5 856874    72828 1346865   26094  4687889 2140632    438713 4839654 4978951
## 6   5629     4136    2802     162    80992   37380      6300   72682   78183
##   AGE_UNDER5 AGE_5_9 AGE_10_14 AGE_15_19 AGE_20_24 AGE_25_34 AGE_35_44
## 1      72885   68694     68473     72493     65339    122046    108500
## 2      12877   11564     11324     11356     13158     25589     21878
## 3       3633    3574      3888      4190      3362      6603      7095
## 4       1625    1595      1853      2107      2831      6337      5513
## 5     645793  633690    678845    753630    752788   1475731   1430326
## 6      11983   11756     11755     12224     11032     20562     19167
##   AGE_45_54 AGE_55_64 AGE_65_74 AGE_75_84 AGE_85_UP MED_AGE MED_AGE_M MED_AGE_F
## 1    108479     77285     43502     23473      8462    30.7      30.2      31.4
## 2     20282     12924      6844      3809      1377    31.1      31.9      30.0
## 3     10255     10625      6553      3502      1385    45.0      43.8      45.9
## 4      5447      4113      1984      1041       449    37.0      35.5      40.9
## 5   1368947   1013156    568470    345603    151626    34.8      33.6      35.9
## 6     19291     15833      9868      5468      1926    33.1      31.5      34.3
##   HOUSEHOLDS AVE_HH_SZ HSEHLD_1_M HSEHLD_1_F MARHH_CHD MARHH_NO_C MHH_CHILD
## 1     254610      3.15      23580      25629     74254      58472     12615
## 2      41233      3.19       3313       3884     12993       9299      2115
## 3      26548      2.39       3913       3970      4061       7381       926
## 4      10058      2.50       1348       1231      2068       3094       413
## 5    3241204      2.98     360530     424398    790374     690291    115984
## 6      43317      3.28       3342       3909     12660      12558      2056
##   FHH_CHILD FAMILIES AVE_FAM_SZ HSE_UNITS VACANT OWNER_OCC RENTER_OCC
## 1     28989   191739       3.61    284367  29757    152828     101782
## 2      4818    31939       3.59     43867   2634     22329      18904
## 3      2051    16255       2.94     35492   8944     17472       9076
## 4       710     6800       2.98     12710   2652      6590       3468
## 5    296976  2194080       3.58   3445076 203872   1544749    1696455
## 6      4020    34093       3.63     49140   5823     27726      15591
##   NO_FARMS07 AVG_SIZE07 CROP_ACR07 AVG_SALE07    SQMI CountyFIPS
## 1       2117       1116     942827    1513.53 8161.35      06103
## 2       1129        603     512870    1203.20 1391.39      06089
## 3        845        147      28997      72.31 1329.46      06106
## 4        459       1000      82567     120.92 4720.42      06086
## 5       1734         63      49158     187.94 4087.19      06073
## 6       1708        398     290683     579.70 2153.29      06102
##                    NEIGHBORS PopNeigh NEIGHBOR_1 PopNeigh_1 NEIGHBOR_2
## 1 San Bernardino,Tulare,Inyo  2495935       <NA>         NA       <NA>
## 2         Fresno,Kern,Tulare  2212260       <NA>         NA       <NA>
## 3                       <NA>        0       <NA>         NA       <NA>
## 4                       <NA>        0       <NA>         NA       <NA>
## 5        San Bernardino,Kern  2874841       <NA>         NA       <NA>
## 6                Mono,Fresno   944652       <NA>         NA       <NA>
##   PopNeigh_2                       geometry
## 1         NA MULTIPOLYGON (((193446 -244...
## 2         NA MULTIPOLYGON (((12524.03 -1...
## 3         NA MULTIPOLYGON (((-240632.1 9...
## 4         NA MULTIPOLYGON (((-45364.03 3...
## 5         NA MULTIPOLYGON (((173874.5 -4...
## 6         NA MULTIPOLYGON (((16681.21 -1...

In particular, we are interested in the columns with numeric values as these are the ones typically used to make data maps.

To get started, let’s create a choropleth map by setting the color of each county based on the value in the population per square mile column (POP12_SQMI).

Recall that sf’s plot method does this by default! So, here’s the quickest way to make a choropleth:

plot(counties['POP12_SQMI'])

By default, sf::plot linearly scales the colors to the data values. This is called a proportional color map.

Choropleth mapping with tmap

We can also use tmap to create thematic maps. This package gives us greater control over the visualization details.

In tmap, instead of setting the col argument to the same static value (e.g. ‘red’, ‘#ef03a5’) for all features, we can set it to the name of the column by which we want our polygons colored (e.g. ‘POP12_SQMI’).

# Set the mapping mode to a static plot (not interactive)
tmap_mode('plot')  
## tmap mode set to plotting
# Map the county polygons colored by the values in the POP12_SQMI column
tm_shape(counties) + 
  tm_polygons(col = 'POP12_SQMI',
              title = "Population Density per mi^2")

By default, tmap uses a yellow-orange-brown (YlOrBr) sequential color palette for thematic maps and bins those colors into 3 to 7 classes of approximately equal intervals with rounded values for class breaks.

Of course, we can also use tmap’s interactive mapping mode. Do you recall the syntax for:

  • setting the tmap mode to static vs interactive mapping?

  • or toggling between these two modes?

Let’s make an interactive map, making our layer partially transparent, i.e. alpha = 0.4, so that we can see the basemap through our polygons.

  • This transparency may be more or less noticeable depending on the selected basemap!
tmap_mode('view')
## tmap mode set to interactive viewing
tmap_options(check.and.fix = TRUE)  # force tmap to display invalid polygons

tm_shape(counties) +
  tm_polygons(col='POP12_SQMI', alpha=0.5,
              title = "Population Density per mi^2")
## Warning: The shape counties is invalid (after reprojection). See sf::st_is_valid

That’s really the heart of of creating a choropleth map with tmap. To set the color of the features based on the values in a column, set the col argument to the column name in the sf data.frame (cast as a string!).

Practice

Redo the map above, but mapping population (POP2012) NOT population density.

# Map of County Population

Question

What map better conveys county population - POP12_SQMI or POP2012?

The Challenge of Thematic Maps

The goal of a thematic map is to use color to visualize the spatial distribution of a variable.

Another goal is to use color to effectively and quickly convey information. For example,

  • maps use brighter or richer colors to signify higher values,

  • and leverage cognitive associations such as mapping water with the color blue.

There are two major challenges when creating thematic maps:

  1. Our eyes are drawn to the color of larger areas or linear features, even if the values of smaller features are more significant.

  2. The range of data values is rarely evenly distributed across all observations and thus the colors can be misleading.

Questions

  • Do you see this either of these problems in our population-density map?

    • Take a look at the histogram below as you consider the above question.
hist(counties$POP12_SQMI,
     breaks = 40, 
     main = 'Population Density per mi^2')

There are three main techniques for dealing with these mapping challenges:

  1. Color palettes

  2. Data transformations

  3. Classification schemes

5.2 Color Palettes

There are three main types of color palettes (or color maps), each of which has a different purpose:

  • diverging - a “diverging” set of colors are used so emphasize mid-range values as well as extremes.

  • sequential - usually with a single or multi color hue to emphasize differences in order and magnitude, where darker colors typically mean higher values

  • qualitative - a contrasting set of colors to identify distinct categories and avoid implying quantitative significance.

Tip: Sites like ColorBrewer let’s you play around with different types of color maps.

To see the names of all color palettes avaialble to tmap, try the following command. You may need to enlarge the output image.

RColorBrewer::display.brewer.all()

As a best practice, a qualitative color palette should not be used with quantitative data and vice versa. For example, consider this map that EDM.com published of top dance tracks by state.

5.3 Transforming Count Data

For a number of reasons, data are often distributed in aggregated form. For example, the Census Bureau collects data from individual people, households and businesses and distributes it aggregated to states, counties, and census tracts, etc.

When the aggregated data are counts, like total population, they can be transformed to densities, proportions and ratios. These normalized variables are more comparable across regions that differ greatly in size.

Let’s consider this in terms of our data.

  • Counts
    • data counts, aggregated by feature
      • e.g. population within a county
  • Densities
    • counts aggregated by feature and normalized by feature area
      • e.g. population per square mile within a county
  • Proportions / Percentages
    • value in a specific category divided by total value across in all categories
      • e.g. proportion of the county population that is white compared to the total county population
  • Rates / Ratios
    • value in one category divided by value in another category
      • e.g. homeowner-to-renter ratio would be calculated as the number of homeowners (c_owners/ c_renters)

The basic cartographic rule is that when mapping areas that differ in size you never map counts since those differences in size make the comparison less invalid.

5.4 Classification schemes

Another way to make more meaningful maps is to improve the way in which data values are mapped to colors.

The common alternative to a proportional color map is to use a classification scheme to create a graduated color map. This is the standard way to create a choropleth map.

A classification scheme is a method for binning continuous data values into 4-7 classes (the default is 5) and map those classes to a color palette.

The commonly used classifications schemes:

  • Equal intervals or Pretty
    • equal-size data ranges (e.g., values within 0-10, 10-20, 20-30, etc.)
    • pros:
      • best for data spread across entire range of values
      • easily understood by map readers
    • cons:
      • but avoid if you have highly skewed data or a few big outliers
  • Quantiles
    • equal number of observations in each bin
    • pros:
      • looks nice, becuase it best spreads colors across full set of data values
      • thus, it’s often the default scheme for mapping software
    • cons:
      • bin ranges based on the number of observations, not on the data values
      • thus, different classes can have very similar or very different values.
  • Natural breaks
    • minimize within-class variance and maximize between-class differences
    • e.g. ‘fisher-jenks’,
    • pros:
      • great for exploratory data analysis, because it can identify natural groupings
    • cons:
      • class breaks are best fit to one dataset, so the same bins can’t always be used for multiple years
  • Head/Tails
    • a new relatively new scheme for data with a heavy-tailed distribution
  • Manual
    • classifications are user-defined
    • pros:
      • especially useful if you want to slightly change the breaks produced by another scheme
      • can be used as a fixed set of breaks to compare data over time
    • cons:
      • more work involved

Classification schemes and tmap

Classification schemes can be implemented using the tmap geometry functions (tm_polygons, tm_dots, etc.) by setting a value for the style argument.

Here are some of the tmap keyword names for classification styles that we can use (from the docs: ?tm_polygons):

  • equal, quantile,fisher, jenks, headtails, fixed, kmeans, pretty.

For more information about these classification schemes see ?classIntervals or sources such as this page in the Lovelace, Nowosad, and Muenchow ebook.


Classification schemes in action

Let’s redo the last map using the quantile classification scheme.

  • What is different about the code? About the output map?
tmap_mode('plot')
## tmap mode set to plotting
# Plot population density - mile^2
tm_shape(counties) + 
  tm_polygons(col = 'POP12_SQMI',
              style = "quantile",
              alpha = 0.5,
              title = "Population Density per mi^2")
## Warning: The shape counties is invalid. See sf::st_is_valid

Practice

Redo the previous map with these classification schemes: headtails, equal, jenks

  • Which one do you like best?

User Defined Classification Schemes

You may get pretty close to your final map without being completely satisfied. In this case you can manually define a classification scheme.

Let’s customize our map with a user-defined classification scheme where we manually set the breaks for the bins using the classification_kwds argument.

tm_shape(counties) + 
  tm_polygons(col = 'POP12_SQMI',
              palette = "YlGn", 
              style = 'fixed',
              breaks = c(0, 50, 100, 200, 300, 400, max(counties$POP12_SQMI)),
              title = "Population Density per Square Mile")
## Warning: The shape counties is invalid. See sf::st_is_valid

Since we are customizing our plot, we can also edit our legend to specify the text, so that it’s easier to read.

  • We’ll use tm_add_legend to build our own customized legend.
tm_shape(counties) + 
  tm_polygons(col = 'POP12_SQMI',
              palette = "YlGn", 
              style='fixed',
              breaks = c(0, 50, 100, 200, 300, 400, max(counties$POP12_SQMI)),
              legend.show = F) +
tm_add_legend('fill', col = RColorBrewer::brewer.pal(6, "YlGn"),
              border.col = "black",
              title = "Population Density per Sq Mile",
              labels = c('<50','50 to 100','100 to 200','200 to 300','300 to 400','>400'))
## Warning: The shape counties is invalid. See sf::st_is_valid

Let’s plot a ratio

If we look at the columns in our dataset, we see we have a number of variables from which we can calculate proportions, rates, and the like.

Let’s try that out:

head(counties)
## Simple feature collection with 6 features and 58 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -267387.9 ymin: -578158.6 xmax: 216677.6 ymax: 352693.6
## Projected CRS: NAD83 / California Albers
##   FID_        NAME STATE_NAME POP2010 POP10_SQMI POP2012  POP12_SQMI   WHITE
## 1    0        Kern California  839631      102.9  851089  104.282870  499766
## 2    0       Kings California  152982      109.9  155039  111.427421   83027
## 3    0        Lake California   64665       48.6   65253   49.082334   52033
## 4    0      Lassen California   34895        7.4   35039    7.422856   25532
## 5    0 Los Angeles California 9818605     2402.3 9904341 2423.264150 4936599
## 6    0      Madera California  150865       70.1  153025   71.065672   94456
##    BLACK AMERI_ES   ASIAN HAWN_PI HISPANIC   OTHER MULT_RACE   MALES FEMALES
## 1  48921    12676   34846    1252   413033  204314     37856  433108  406523
## 2  11014     2562    5620     271    77866   42996      7492   86344   66638
## 3   1232     2049     724     108    11088    5455      3064   32469   32196
## 4   2834     1234     356     165     6117    3562      1212   22416   12479
## 5 856874    72828 1346865   26094  4687889 2140632    438713 4839654 4978951
## 6   5629     4136    2802     162    80992   37380      6300   72682   78183
##   AGE_UNDER5 AGE_5_9 AGE_10_14 AGE_15_19 AGE_20_24 AGE_25_34 AGE_35_44
## 1      72885   68694     68473     72493     65339    122046    108500
## 2      12877   11564     11324     11356     13158     25589     21878
## 3       3633    3574      3888      4190      3362      6603      7095
## 4       1625    1595      1853      2107      2831      6337      5513
## 5     645793  633690    678845    753630    752788   1475731   1430326
## 6      11983   11756     11755     12224     11032     20562     19167
##   AGE_45_54 AGE_55_64 AGE_65_74 AGE_75_84 AGE_85_UP MED_AGE MED_AGE_M MED_AGE_F
## 1    108479     77285     43502     23473      8462    30.7      30.2      31.4
## 2     20282     12924      6844      3809      1377    31.1      31.9      30.0
## 3     10255     10625      6553      3502      1385    45.0      43.8      45.9
## 4      5447      4113      1984      1041       449    37.0      35.5      40.9
## 5   1368947   1013156    568470    345603    151626    34.8      33.6      35.9
## 6     19291     15833      9868      5468      1926    33.1      31.5      34.3
##   HOUSEHOLDS AVE_HH_SZ HSEHLD_1_M HSEHLD_1_F MARHH_CHD MARHH_NO_C MHH_CHILD
## 1     254610      3.15      23580      25629     74254      58472     12615
## 2      41233      3.19       3313       3884     12993       9299      2115
## 3      26548      2.39       3913       3970      4061       7381       926
## 4      10058      2.50       1348       1231      2068       3094       413
## 5    3241204      2.98     360530     424398    790374     690291    115984
## 6      43317      3.28       3342       3909     12660      12558      2056
##   FHH_CHILD FAMILIES AVE_FAM_SZ HSE_UNITS VACANT OWNER_OCC RENTER_OCC
## 1     28989   191739       3.61    284367  29757    152828     101782
## 2      4818    31939       3.59     43867   2634     22329      18904
## 3      2051    16255       2.94     35492   8944     17472       9076
## 4       710     6800       2.98     12710   2652      6590       3468
## 5    296976  2194080       3.58   3445076 203872   1544749    1696455
## 6      4020    34093       3.63     49140   5823     27726      15591
##   NO_FARMS07 AVG_SIZE07 CROP_ACR07 AVG_SALE07    SQMI CountyFIPS
## 1       2117       1116     942827    1513.53 8161.35      06103
## 2       1129        603     512870    1203.20 1391.39      06089
## 3        845        147      28997      72.31 1329.46      06106
## 4        459       1000      82567     120.92 4720.42      06086
## 5       1734         63      49158     187.94 4087.19      06073
## 6       1708        398     290683     579.70 2153.29      06102
##                    NEIGHBORS PopNeigh NEIGHBOR_1 PopNeigh_1 NEIGHBOR_2
## 1 San Bernardino,Tulare,Inyo  2495935       <NA>         NA       <NA>
## 2         Fresno,Kern,Tulare  2212260       <NA>         NA       <NA>
## 3                       <NA>        0       <NA>         NA       <NA>
## 4                       <NA>        0       <NA>         NA       <NA>
## 5        San Bernardino,Kern  2874841       <NA>         NA       <NA>
## 6                Mono,Fresno   944652       <NA>         NA       <NA>
##   PopNeigh_2                       geometry
## 1         NA MULTIPOLYGON (((193446 -244...
## 2         NA MULTIPOLYGON (((12524.03 -1...
## 3         NA MULTIPOLYGON (((-240632.1 9...
## 4         NA MULTIPOLYGON (((-45364.03 3...
## 5         NA MULTIPOLYGON (((173874.5 -4...
## 6         NA MULTIPOLYGON (((16681.21 -1...

Let’s calculate the percent of the population that is hispanic and save it to a new column. Then, we can use that to create a choropleth map.

# calculate percent hispanic as a new column
counties$pct_hispanic = counties$HISPANIC/counties$POP2012 * 100

# Plot percent hispanic as choropleth
tm_shape(counties) + 
  tm_polygons(col = 'pct_hispanic',
              palette = 'Blues', 
              style = 'fixed',
              breaks = c(0, 20, 40, 60, 80, 100),
              border.col = "darkgrey",
              lwd = 1.5,
              legend.show = F) + 
tm_add_legend('fill', col = RColorBrewer::brewer.pal(5, "Blues"),
              border.col = "darkgrey",
              title = "Percent Hispanic Population",
              labels = c('<20%',
                         '20% - 40%',
                         '40% - 60%',
                         '60% - 80%',
                         '80% - 100%'))
## Warning: The shape counties is invalid. See sf::st_is_valid

Question

  1. What new options and operations have we added to our code?

  2. How many values do we specify in the breaks vector, and how many bins are in the map legend? Why?

5.5 Point Maps

Choropleth maps are great, but point maps enable us to visualize our spatial data in another way.

If you know both mapping methods you can expand how much information you can show in one map.

For example, point maps are a great way to map counts because the varying sizes of areas are deemphasized.

The tm_dot element makes it easy to create point maps dynamically from polygon data!

# County population counts as a point map!
tmap_mode('plot')
## tmap mode set to plotting
# Add the county polygon borders as a basemap
tm_shape(counties) + 
  tm_borders(col = "grey") +
  
# Then map the county centroids as points colored by population counts
  tm_shape(counties) + 
  tm_dots(col = 'POP2012',
              palette = 'YlOrRd', 
              style = 'jenks',
              border.col = "black",  # dot borders only visible in interactive mode!
              border.lwd = 1,
              border.alpha = 1,
              size = .5,
              legend.show = T) 
## Warning: The shape counties is invalid. See sf::st_is_valid

## Warning: The shape counties is invalid. See sf::st_is_valid

This is another useful type of data transformation for making effective maps.

More Point Data Maps

Let’s read in some data that is more typically encoded with point geometry - Alameda County schools.

schools_df <- read.csv(here("notebook_data",
                            "alco_schools.csv"))

head(schools_df)
##           X        Y                      Site               Address    City
## 1 -122.2388 37.74476 Amelia Earhart Elementary 400 Packet Landing Rd Alameda
## 2 -122.2519 37.73900       Bay Farm Elementary   200 Aughinbaugh Way Alameda
## 3 -122.2589 37.76206  Donald D. Lum Elementary    1801 Sandcreek Way Alameda
## 4 -122.2348 37.76525         Edison Elementary  2700 Buena Vista Ave Alameda
## 5 -122.2381 37.75396     Frank Otis Elementary      3010 Fillmore St Alameda
## 6 -122.2616 37.76911       Franklin Elementary  1433 San Antonio Ave Alameda
##   State Type API    Org
## 1    CA   ES 933 Public
## 2    CA   ES 932 Public
## 3    CA   ES 853 Public
## 4    CA   ES 927 Public
## 5    CA   ES 894 Public
## 6    CA   ES 893 Public

We got it from a plain CSV file, let’s promote it to an sf data.frame.

schools_sf <- st_as_sf(schools_df, 
                       coords = c('X','Y'),
                       crs = 4326)

Then we can map it.

plot(schools_sf)

What is useful about the above display of the maps for each column in the dataframe is that at a glance you can see the type of data variable and get a sense of the range of values.

The default sf::plot point map for a numeric data column is a proportional color map that linearly scales the color of the point symbol by the data values.

# Point map of API - Academic Performance Index
plot(schools_sf['API'])

Point maps with tmap

Let’s try creating the same map with tmap.

tmap_mode('plot')
## tmap mode set to plotting
tm_shape(schools_sf) + 
  tm_dots(col = "API")

The basic tmap graduated color map needs some customization to shine, especially in plot mode!

By default, tmap uses a yellow-orange-brown (YlOrBr) sequential color palette and the pretty classification scheme for point thematic maps. These are the same defaults that are used for tmap choropleth maps. But point maps that symbolize data values by color are called Graduated Color Maps. In spite of the different map names, the color and classification scheme options are almost identical in tmap! However, some options will be different - for example, a size parameter makes sense for a point radius but not a polygon!

See ?tm_dot for more information about the options for customizing point maps! For example…

# API Graduated Color Map
tm_shape(schools_sf) +
  tm_dots(col = 'API', 
          size = 0.15,
          palette = 'Reds', 
          style = 'fixed',
          breaks = c(0, 200, 400, 600, 800, 1000),
          border.col = 'grey',
          legend.show = F) + 
  tm_add_legend('fill', 
                title = 'Alameda County, school API scores',
                labels = c('<200', 
                           '[200,400)', 
                           '[400,600)', 
                           '[600,800)', 
                           '>800'),
                col = RColorBrewer::brewer.pal(5, "Reds")) +
  tm_layout(legend.position = c('right', 'top'))

Proportional Symbol Maps

Another important type of point map is the proportional symbol map. These are like proportial color maps but instead of associating symbol color with data values they associate symbol size. You can make these in tmap with the tm_bubbles function.

The schools data does not contain any good variables for proportional symbol mapping so we will read in a supplemental file of National Center for Educational Statistics (NCES) data and join it to the school points.

df = read.csv(here("notebook_data",
                   "other",
                   "PolicyMap_NCES_Data_20210429.csv"))

# head(df,2)

df2 = df[c('School.Name',
           'Student.Teacher.Ratio',
           'Free.and.Reduced.price.Lunch.Eligible.Students')]

colnames(df2) <- c('Site',
                   'STRatio',
                   'RLunch')

# head(df2,2)

schools_sf2 <- merge(schools_sf, df2, by = "Site")

# head(schools_sf2,2)
tmap_mode('plot')
## tmap mode set to plotting
tm_shape(schools_sf2) + 
  tm_bubbles(size = "RLunch", 
             col = "pink", 
             border.col = 'black', 
             title.size = "Students Eligible for Free/Reduced Lunch") +
  tm_layout(legend.position = c('right',
                                'top'))

5.5 Mapping Categorical Data

Mapping categorical data, also called qualitative data, is a bit more straightforward. There is no need to scale or classify data values. The goal of the color map is to provide a contrasting set of colors so as to clearly delineate different categories. Here’s a point-based example:

tm_shape(schools_sf) + 
  tm_dots(col = 'Org', 
          size = 0.15, 
          palette = 'Spectral', 
          title = "School Type")

5.6 Recap

We learned about important data driven mapping strategies and mapping concepts, including:

  • Choropleth Maps
  • Color Palettes
  • Classification Schemes
  • Point maps

Point and polygons are not the only geometry-types that we can use in data-driven mapping! You can also map linear features by associating data values with the color, shape and size of features. But these types of maps are less common.

Exercise: Data-Driven Mapping

Practice creating choropleth and graduated color maps with the counties data. Pick one quantitative variable like MED_AGE and try different color palettes and classification schemes.


 D-Lab @ University of California - Berkeley
 Team Geo